large language model
OpenAI proposes handing U.S. government a 5% stake, report says
OpenAI proposes handing U.S. government a 5% stake, report says OpenAI has discussed giving the U.S. government a 5% stake as artificial intelligence firms face scrutiny in Washington. OpenAI has discussed giving the U.S. government a 5% stake, the Financial Times reported on Thursday, as artificial intelligence firms face scrutiny in Washington over the likely misuse of advanced models and whether Americans would benefit from the industry's massive valuations. The ChatGPT creator has proposed that other U.S. AI firms also give Washington similar stakes, although it is unclear whether they would agree, the report said, citing two people familiar with the talks. The move follows growing public backlash in the U.S. over AI's potential to cause economic upheaval, including layoffs, and could help OpenAI sweeten ties with an administration that is increasingly taking an active role in regulating the technology. In a time of both misinformation and too much information, quality journalism is more crucial than ever.
A new, inexpensive Chinese AI model is catching up with Anthropic, OpenAI on their home turf
Zhipu's AI service on the web, dubbed Z.ai. BEIJING/BENGALURU - Since DeepSeek shocked markets early last year with its cheap but powerful artificial intelligence model, global consumers have been faced with a choice: Chinese offerings with lower prices and less capability or OpenAI or Anthropic, which have poured billions into development. A model called GLM-5.2, launched last month by Beijing-based startup Z.ai, may finally be closing that gap in terms of Western interest. GLM-5.2 has Silicon Valley buzzing with its coding and agent capabilities, or the ability to execute complex tasks with minimal prompting, that almost rival leading U.S. offerings at a fraction of the cost, in what some experts are calling a "mini DeepSeek moment." In a time of both misinformation and too much information, quality journalism is more crucial than ever.
How to Allocate Your Tokens? Scaling Laws with Training Steps and Batch Size
We propose a scaling law that takes into account model size and training data while explicitly splitting the latter into training steps and batch size (called three-term law). Fitting the proposed law on a large set of training runs, we find that it correctly recovers the scaling of the optimal batch size. Moreover, because it makes use of training runs with suboptimal batch size, our proposed law can be robustly fit with a significantly smaller amount of training runs. We further show that the three-term law can be used to derive scaling laws for suboptimal batch sizes, and that it matches previous empirical findings related to the critical batch size.
The Dual Nature of LLM Persona: Aggregated Tendencies and Frame-Dependent Geometry
Evaluations of LLM personas via psychometric questionnaires typically rely on aggregate scores, discarding within-instance correlation structure. We test whether this geometric structure is intrinsic or frame-dependent. Constructing within-instance correlation matrices from IPIP-50 responses, we analyze geometry on SPD manifolds under manipulated question orderings in GPT-4o simulating American and Chinese-American personas. We find that persona expression comprises two dissociable components: aggregated features (Big Five scores) degrade under randomization (21% drop) but are frame-robust; geometric features (SPD manifold) collapse under frame misalignment (42% drop) but recover substantially (to 84%) under shared frames, surpassing aggregated features (76%). This collapse-recovery pattern reveals that persona geometry is not intrinsic but a frame-dependent coordination pattern encoding information invisible to aggregation. Our findings establish a dual-nature framework for LLM personas, frame-dependent geometry versus frame-robust aggregates, necessitating frame-aware evaluation and challenging static trait conceptions.
Online Safety Monitoring for LLMs
Schirmer, Mona, Jazbec, Metod, Timans, Alexander, Naesseth, Christian, Waldron, Maja, Nalisnick, Eric
We deploy a simple into our everyday lives as search engines (Jin et al., 2025; statistical framework based on risk control (Angelopoulos Xiong et al., 2024), coding assistants (Zhao et al., 2023), et al., 2022) that converts any safety signal into a binary and companions (Zhang et al., 2025a). As their applicability grows, so does the potential harm caused by malicious decision rule, and offers statistical guarantees on the false LLM outputs. Despite remarkable performance across a alarm or missed detection rate. The framework is universally applicable to different monitoring purposes and can leverage wide range of tasks, LLMs remain prone to generating halarbitrary proxy signals. Through experiments on mathematlucinated, factually incorrect (Ravichander et al., 2025), or ical problem solving and red teaming conversations, we harmful output (Yu et al., 2025) when deployed.
This Star-Studded Movie Cost 40 Million to Make. It Hasn't Been Released Yet. The Reason Why Is Nefarious.
The drama reveals just how deeply Silicon Valley has sunk its claws into Hollywood. Enter your email to receive alerts for this author. You can manage your newsletter subscriptions at any time. You're already subscribed to the aa_Nitish_Pahwa newsletter. You can manage your newsletter subscriptions at any time.
The Download: a startup has a solution for AI's groupthink problem
The Download: a startup has a solution for AI's groupthink problem Plus: Scientists say they have built a cell from scratch for the first time. LLMs are stuck in a groupthink groove. This startup is trying to get them out. Open up your chatbot of choice--Claude, ChatGPT, Gemini--and type "Give me a random number between 1 and 10." You're going to get 7. Almost always. That won't work every time--but if it did for you, you may wonder if I have superpowers. The truth is that most large language models are stuck in a rut.
Can Microsoft's productivity apps survive the age of AI?
PCWorld examines whether Microsoft's core productivity apps like Word, Excel, and PowerPoint can withstand disruption from advancing AI technology. External AI applications such as ChatGPT and Claude now offer similar document formatting, content creation, and synthesis capabilities that rival Microsoft's own Copilot feature. The analysis suggests Microsoft's traditional productivity suite may become obsolete as AI chatbots increasingly handle tasks previously requiring dedicated office applications. Are Microsoft's core productivity apps -- Word, Excel, and PowerPoint -- endangered by the rise of AI? That's the point that Bloomberg and its sources addressed in coverage this week, noting that Microsoft is being buffeted by AI disruption as its stock plunges. "Whether Microsoft Word or Excel will be rendered obsolete by AI remains to be seen," said Jack Ablin, chief investment strategist at Cresset Wealth Advisors, which owns the stock, according to Bloomberg. "We don't know what the environment is going to look like in a few years, which opens up very real questions like, will we even use a Microsoft suite anymore?" Keith Fitz-Gerald, principal at the Fitz-Gerald Group, added.
OpenAI reportedly wants all AI companies to give the US government a stake in their businesses
Sam Altman is in talks with the US government in a bid to clear political hurdles, says the Financial Times. OpenAI's Sam Altman has reportedly been in talks with the US government to ensure his company's path towards achieving its goals remains free of political hurdles. According to the Financial Times, Altman has suggested giving the government a five percent stake in the company, in order to share the spoils of the AI boom with the public. But his idea doesn't only involve OpenAI: Under his proposal, other top AI companies like Google, Anthropic, xAI and Meta would have to agree to give the government a similar stake in their businesses. AI companies like Anthropic and OpenAI have recently encountered roadblocks from the US government when it came to releasing their latest AI models.
Adaptive parallel reasoning: the next paradigm in efficient inference scaling
What if a reasoning model could decide when to decompose and parallelize independent subtasks, how many concurrent threads to spawn, and how to coordinate them based on the problem at hand? We provide a detailed analysis of recent progress in the field of parallel reasoning, especially adaptive parallel reasoning. Disclosure: this post is part landscape survey, part perspective on adaptive parallel reasoning. One of the authors (Tony Lian) co-led ThreadWeaver ( Lian et al., 2025), one of the methods discussed below. The authors aim to present each approach on its own terms. Recent progress in LLM reasoning capabilities has been largely driven by inference-time scaling, in addition to data and parameter scaling ( OpenAI et al., 2024; DeepSeek-AI et al., 2025). Models that explicitly output reasoning tokens (through intermediate steps, backtracking, and exploration) now dominate math, coding, and agentic benchmarks.